Search CORE

100 research outputs found

Recommended from our members

Using attribution to decode binding mechanism in neural network models for chemistry

Author: Colwell Lucy
Publication venue: Proceedings of the National Academy of Sciences of the United States of America
Publication date: 11/06/2019
Field of study

Deep neural networks have achieved state-of-the-art accuracy at classifying molecules with respect to whether they bind to specific protein targets. A key breakthrough would occur if these models could reveal the fragment pharmacophores that are causally involved in binding. Extracting chemical details of binding from the networks could enable scientific discoveries about the mechanisms of drug actions. However, doing so requires shining light into the black box that is the trained neural network model, a task that has proved difficult across many domains. Here we show how the binding mechanism learned by deep neural network models can be interrogated, using a recently described attribution method. We first work with carefully constructed synthetic datasets, in which the molecular features responsible for “binding” are fully known. We find that networks that achieve perfect accuracy on held-out test datasets still learn spurious correlations, and we are able to exploit this nonrobustness to construct adversarial examples that fool the model. This makes these models unreliable for accurately revealing information about the mechanisms of protein–ligand binding. In light of our findings, we prescribe a test that checks whether a hypothesized mechanism can be learned. If the test fails, it indicates that the model must be simplified or regularized and/or that the training dataset requires augmentation.M.P.B. gratefully acknowledges support from the National Science Foundation through NSF-DMS1715477, as well as support from the Simons Foundation. L.J.C. gratefully acknowledges a Next Generation fellowship, a Marie Curie Career Integration Grant (Evo-Couplings, 631609), and support from the Simons Foundation. F.M. performed work during an internship at Google

Apollo (Cambridge)

Protein sectors: statistical coupling analysis versus conservation

Author: Colwell Lucy J.
Leibler Stanislas
Tesileanu Tiberiu
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 16/12/2014
Field of study

Statistical coupling analysis (SCA) is a method for analyzing multiple sequence alignments that was used to identify groups of coevolving residues termed "sectors". The method applies spectral analysis to a matrix obtained by combining correlation information with sequence conservation. It has been asserted that the protein sectors identified by SCA are functionally significant, with different sectors controlling different biochemical properties of the protein. Here we reconsider the available experimental data and note that it involves almost exclusively proteins with a single sector. We show that in this case sequence conservation is the dominating factor in SCA, and can alone be used to make statistically equivalent functional predictions. Therefore, we suggest shifting the experimental focus to proteins for which SCA identifies several sectors. Correlations in protein alignments, which have been shown to be informative in a number of independent studies, would then be less dominated by sequence conservation.Comment: 36 pages, 17 figure

arXiv.org e-Print Archive

Public Library of Science (PLOS)

City University of New York

Directory of Open Access Journals

PubMed Central

FigShare

Recommended from our members

Statistical and machine learning approaches to predicting protein-ligand interactions.

Author: Colwell Lucy J
Publication venue: Curr Opin Struct Biol
Publication date: 20/02/2018
Field of study

Data driven computational approaches to predicting protein-ligand binding are currently achieving unprecedented levels of accuracy on held-out test datasets. Up until now, however, this has not led to corresponding breakthroughs in our ability to design novel ligands for protein targets of interest. This review summarizes the current state of the art in this field, emphasizing the recent development of deep neural networks for predicting protein-ligand binding. We explain the major technical challenges that have caused difficulty with predicting novel ligands, including the problems of sampling noise and the challenge of using benchmark datasets that are sufficiently unbiased that they allow the model to extrapolate to new regimes

Apollo (Cambridge)

Recommended from our members

Comparative analysis of nanobody sequence and structure data.

Author: Colwell Lucy J
Mitchell Laura S
Publication venue: Proteins
Publication date: 01/07/2018
Field of study

Nanobodies are a class of antigen-binding protein derived from camelids that achieve comparable binding affinities and specificities to classical antibodies, despite comprising only a single 15 kDa variable domain. Their reduced size makes them an exciting target molecule with which we can explore the molecular code that underpins binding specificity-how is such high specificity achieved? Here, we use a novel dataset of 90 nonredundant, protein-binding nanobodies with antigen-bound crystal structures to address this question. To provide a baseline for comparison we construct an analogous set of classical antibodies, allowing us to probe how nanobodies achieve high specificity binding with a dramatically reduced sequence space. Our analysis reveals that nanobodies do not diversify their framework region to compensate for the loss of the VL domain. In addition to the previously reported increase in H3 loop length, we find that nanobodies create diversity by drawing their paratope regions from a significantly larger set of aligned sequence positions, and by exhibiting greater structural variation in their H1 and H2 loops

Apollo (Cambridge)

Charge as a Selection Criterion for Translocation through the Nuclear Pore Complex

Author: Brenner Michael P.
Colwell Lucy J.
Ribbeck Katharina
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/09/2009
Field of study

Nuclear pore complexes (NPCs) are highly selective filters that control the exchange of material between nucleus and cytoplasm. The principles that govern selective filtering by NPCs are not fully understood. Previous studies find that cellular proteins capable of fast translocation through NPCs (transport receptors) are characterized by a high proportion of hydrophobic surface regions. Our analysis finds that transport receptors and their complexes are also highly negatively charged. Moreover, NPC components that constitute the permeability barrier are positively charged. We estimate that electrostatic interactions between a transport receptor and the NPC result in an energy gain of several kBT, which would enable significantly increased translocation rates of transport receptors relative to other cellular proteins. We suggest that negative charge is an essential criterion for selective passage through the NPC.Merck Research LaboratoriesNational Science Foundation (U.S.) (Division of Mathematical Sciences)Kavli Institute for Bionano Science & Technology at Harvard UniversityNational Centers for Systems Biology (U.S.) (NIGMS grant GM068763)National Institute of General Medical Sciences (U.S.

DSpace@MIT

Directory of Open Access Journals

PubMed Central

Conservation Weighting Functions Enable Covariance Analyses to Detect Functionally Important Amino Acids

Author: Brenner Michael P.
Colwell Lucy J.
Murray Andrew W.
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 07/11/2014
Field of study

The explosive growth in the number of protein sequences gives rise to the possibility of using the natural variation in sequences of homologous proteins to find residues that control different protein phenotypes. Because in many cases different phenotypes are each controlled by a group of residues, the mutations that separate one version of a phenotype from another will be correlated. Here we incorporate biological knowledge about protein phenotypes and their variability in the sequence alignment of interest into algorithms that detect correlated mutations, improving their ability to detect the residues that control those phenotypes. We demonstrate the power of this approach using simulations and recent experimental data. Applying these principles to the protein families encoded by Dscam and Protocadherin allows us to make testable predictions about the residues that dictate the specificity of molecular interactions

CiteSeerX

Harvard University - DASH

Directory of Open Access Journals

PubMed Central

FigShare

Inferring interaction partners from protein sequences.

Author: Bitbol Anne-Florence
Colwell Lucy J
Dwyer Robert S
Wingreen Ned S
Publication venue: Proc Natl Acad Sci U S A
Publication date: 23/09/2016
Field of study

Specific protein-protein interactions are crucial in the cell, both to ensure the formation and stability of multiprotein complexes and to enable signal transduction in various pathways. Functional interactions between proteins result in coevolution between the interaction partners, causing their sequences to be correlated. Here we exploit these correlations to accurately identify, from sequence data alone, which proteins are specific interaction partners. Our general approach, which employs a pairwise maximum entropy model to infer couplings between residues, has been successfully used to predict the 3D structures of proteins from sequences. Thus inspired, we introduce an iterative algorithm to predict specific interaction partners from two protein families whose members are known to interact. We first assess the algorithm's performance on histidine kinases and response regulators from bacterial two-component signaling systems. We obtain a striking 0.93 true positive fraction on our complete dataset without any a priori knowledge of interaction partners, and we uncover the origin of this success. We then apply the algorithm to proteins from ATP-binding cassette (ABC) transporter complexes, and obtain accurate predictions in these systems as well. Finally, we present two metrics that accurately distinguish interacting protein families from noninteracting ones, using only sequence data.Human Frontier Science Program, National Institutes of Health (Grant ID: R01-GM082938), National Science Foundation (Grant ID: PHY-1305525), Marie Curie (Career Integration Grant ID: 631609), Next Generation Fellowship, Eric and Wendy Schmidt Transformative Technology FundThis is the author accepted manuscript. The final version is available from the Proceedings of the National Academy of Sciences of the United States of America via https://doi.org/10.1073/pnas.160676211

arXiv.org e-Print Archive

Princeton University Open Access Repository

PubMed Central

Apollo (Cambridge)